Statistical Machine Translation and Cross-Language IR: QMUL at CLEF 2006
نویسنده
چکیده
In this year’s CLEF submissions we focus on using a state-of-the-art statistical machine translation approach for ad-hoc cross-language retrieval. Our machine translation approach is phrase-based as opposed to statistical word-based approaches that have been previously used for query translation in cross-language IR. The phrase translation probabilities were estimated by using the Europarl corpus. For query formulation, we also use the n-best lists of translation candidates to assign weights to query terms. Our results show that a statistical phrase-based approach is a competitive alternative to commercial, rule-based machine translation approaches in the context of cross-language IR.
منابع مشابه
LCC's PowerAnswer at QA@CLEF 2006
This paper reports on Language Computer Corporation’s first QA@CLEF participation. For this exercise, we integrated our open-domain PowerAnswer question answering system with our statistical machine translation engine. For 2006, we participated in the English-to-Spanish, French and Portuguese cross-language tasks. We took the approach of intermediate translation, only processing English within ...
متن کاملDomain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval
Statistical Machine Translation (SMT) is often used as a black-box in CLIR tasks. We propose an adaptation method for an SMT model relying on the monolingual statistics that can be extracted from the document collection (both source and target if available). We evaluate our approach on CLEF Domain Specific task (German-English and English-German) and show that very simple document collection st...
متن کاملTask3 Patient-Centred Information Retrieval: Team CUNI
In this paper we present our participation as the team of the Charles University at Task3 Patient-Centred Information Retrieval. In the monolingual task and its subtasks, we submitted two runs: one is based on language model approach and the second one is based on vector space model. For the multilingual task, Khresmoi translator, a Statistical Machine Translation (SMT) system, is used to trans...
متن کاملDublin City University at CLEF 2007: Cross Language Speech Retrieval (CL-SR) Experiments
The Dublin City University participated in the CLEF 2007 CL-SR English task. For CLEF 2007 we concentrated primarily on the issues of topic translation, combining this with search field combination and pseudo relevance feedback methods used for our CLEF 2006 submissions. Topics were translated into English using the Yahoo! BabelFish free online translation service combined with domain-specific ...
متن کاملUnsupervised Morpheme Analysis Evaluation by IR experiments - Morpho Challenge 2007
This paper presents the evaluation of Morpho Challenge Competition 2 (information retrieval). The Competition 1 (linguistic gold standard) is described in a companion paper. In Morpho Challenge 2007, the objective was to design statistical machine learning algorithms that discover which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic voc...
متن کامل